Disclaimer:
All data are correct as at time of publishing and sourced from the official Ministry of Health GitHub repo. The author will not be responsible for any loss or injury arising from the use of the information published here. The views presented here are reflections of the author’s thoughts and are not to be taken as recommendations. While the author has attempted to ensure there are no errors or mistakes in the processing and analysis of the data, it is solely the responsibility of the reader to verify and validate the information provided. Any opinions or work derived from the information presented here is not the responsibility of the author.
Data source :
https://github.com/MoH-Malaysia/covid19-public
https://github.com/CITF-Malaysia/citf-public
Source code :
https://github.com/amirmazmi/covid19-malaysia
All charts are interactive please see top right for option. Lines can also be disabled by clicking on the legend.
On 13th July, the daily cases have reached a new high exceeding 10,000 reported cases per day. In the following few days, the cases numbers kept on rising exceeding the 17,000 mark. Despite an accelerating vaccination program, rising cases has caused grave concern in the country and one of the efforts by the MoH was to publish the data.
14 days rolling window and 2 Std Dev
In this chart, the daily cases are applied with a rolling mean and standard deviation (Bollinger Band) as an indicator to trend formation. Generally used in stock and forex trading, Bollinger band are simple enough that it could be calculated with Excel. Two key features are the data points tracking either the upper or lower band and the widening or contraction of the upper and lower bands.
Observations:
As of 30th July 2021, the indication is that the trend has not subsided and will continue to climb until we see a drop in daily cases plus reduction in the width of the bands. An example of this happening can be seen on 1st June 2021, where the rolling implementation of MCO 3.0 across the nation starting from Kelantan on 16th April 2021, has helped reduce the case numbers.
14 days rolling window and 2 Std Dev
Observations:
As more and more of the population are being vaccinated, at some point there will be a transition to focus more on the death numbers (or specifically category 3-5 cases) instead of the daily cases since vaccination does not fully prevent infections but does reduce the likelihood of being severely ill.
It is important to note that vaccination will not remove the necessity to control the infection rates as there are still sections of the population that are still not and potentially will not be vaccinated. Vaccinated parents may still be infected and spread the virus to their children. This still presents a huge risk and it is the firm hope that we will let common sense prevail.
14 days rolling window and 2 Std Dev
Observations:
For point 2, it is important to note that the total test number may not necessarily be unique individuals, i.e. the same person may take two types of different tests in a day, as was seen previously at some outbreak areas to quickly identify and isolate those potentially infected.
*RT-PCR tests require 1-2 days for results while RTK antigen can be completed in a few hours. These depend heavily on lab capacity and logistics. Recently introduced lateral flow test kits can produce results within 10-30 minutes using either a swab or saliva sample and can be done at home.
Generally, Bollinger Bands are used to describe the price movement (forex/stocks) where the price is the balance between the pull of supply (sell) and demand (buy). It is indicative of:
As Bollinger Bands are simply rolling averages and standard deviations, they are good indicators to reflect the variability of the data within the rolling window. Specifically, they reflect where the current data point is in relation to the previous x data points.
When applied to the daily cases, it describes how large the number of people getting infected, or if we relate to the SIR compartmental model, is the number of people moving from S (susceptible) to I (infected). Whilst it would be easy to just say as long as the number today is larger than yesterday so obviously there are more people getting infected, it is important to quantify the change. In this case, it represents that the number of people getting infected are increasing at a rate of approximately 2 standard deviations of the last 14 days data, which is an exponential increase.
In the daily deaths chart, this is the number of people moving from the I (infected) to R (removed, through recovery or death - this is differentiated in a SIRD model).
The rolling window of 14 was selected based on the smoothness of the moving average and the standard deviations as well as applicability through all three graphs. Any number below 30 will yield approximately similar results however, with increasing size the resolution is lost due to loss of sensitivity (lagging reponse).
Each circle represents daily data. Cumulatives (or totals) are used as the cases and deaths are not time matched per observation.
Observations:
We can make some assumptions that deaths are driven by multiple external factors which are not related to the indvidual. Potentially a model could be built to quantify these relationships better. A regression model with the inclusion of a logistic growth term (an SIRD model also includes this term) would be a good starting point, although the aim here would be to investigate the relationship between resources (specifically the effect of vaccination rates and hospital capacity) and deaths.
It is important to note that most effects are exponential, e.g. infection rates, death rates, hence the reason that earlier efforts to flatten the curve was critical and effective. Unfortunately, sustained effort was not achieved.
There has been much discussion about the positive test rate as per WHO (World Health Organization) guidelines which states for a maximum 5% positive rate for effective coverage. What is important to note is that the data for daily cases are not time matched with testing data* as the daily case numbers are when the results are completed from the test labs. Therefore, some variability is expected in relating the daily case numbers directly to the daily testing numbers. While in an ideal situation the positive test data per day be published (with expected delays), in absence of that data, this is the best that can be done. This is worth noting as the data collection process affects the accuracy and understanding that for any insights gleaned from this view are subject to these errors.
Any tests conducted in public and private labs are required to be recorded in the SIMKA (Sistem Informasi Makmal Kesihatan Awam) system and is assumed to be the source of this data. As mentioned previously, the daily test numbers do not reflect a unique individual but a test being done i.e. the same person may take two types of different tests (RTK antigen for timeliness and RT-PCR as verification).
However, as will be observed in the data below, testing in general has not grown fast enough to cope with rising case numbers. Previously, MoH stated their focus on RT-PCR testing, deemed as the gold standard. However, that has changed as more RTK antigen test has been performed. In light of recent events, hopefully more field testing can be done using the cheaper lateral flow test kits that are now widely and cheaply available. At the policy level this could be done by distributing a small number, perhaps 1 or 2 test kits per household to be done by the individual most frequently going out especially in areas with high and growing cases. Confirmation and verification could still be done using RT-PCR.
*Will be changed accordingly if daily positive test numbers are made available.
Observations:
*Note this is not the true positivity rate but an approximation due to the lack of data as mentioned earlier.
5% positive line shows minimum number of tests required for 5% positive rate.
Chart above shows the correlation between daily tests and daily cases. Some data may be missing due to delay in data from the official repository.
Observations:
In Malaysia, the national COVID-19 vaccination program is managed by PICK (Program Immunisasi COVID-19 Kebangsaan or National COVID-19 Immunisation Programme) established under the COVID-19 Immunisation Task Force (CITF). Most vaccines will require two doses although there has also been news regarding addtional booster shots to maintain immune levels and some manufacturers have applied for approval.
The program began on 24th February 2021 and is split into 3 phases1.
Up until 7th August 2021, 24 million total doses have been delivered comprising of both first and second dose. The rollout has had a slow start2 due to initial procurement and delivery issues but has since ramped up since to deliver on average 500,000 doses per day (as of 7th August 2021). There has been some issues administering the vaccine including jabs with empty syringes or the plunger not being pressed3. CITF has taken a firm stand and issued a statement that they will be investigating any reports and punishing any offenders if found to be involved in misconduct. As a result, the task force has allowed video recordings while the vaccine is being administered.
It is important to note that while vaccination reduces the likelihood of infection, albeit minimal, the main outcome would be the increased likelihood of prevention from serious illness and death4. Therefore even with high vaccination numbers, the spread of COVID-19 must still be managed as there are still sections of the population that have not and most likely will not be vaccinated such as pregnant and breastfeeding mothers, children and those ineligible due to medical conditions. Any infection of these population may still result in death. It is hoped with a high number of population vaccinated, it would reduce congestion in hospitals and quarantine centers, allowing for hospitalized care of those in need.
In Malaysia, these severe cases are classified as Category 4 (pneumonia and requires oxygen) and Category 5 (critical and requires ventilator) whereas Category 1 (asymptomatic) and Category 2 (mild symptoms) are deemed as mild. Category 3 (pneumonia) requires medical observation5. The government has recently announced that these will form part of the indicators for relaxed SOP and allow fully vaccinated persons (14 days after 2nd dose) to dine-in at restaurants and travel. Hopefully as the population has had more than one year experience into this, they will take a cautionary approach and still limit their exposure.
[1] https://www.vaksincovid.gov.my/en/phase/
[2] https://www.freemalaysiatoday.com/category/nation/2021/06/21/khairy-tells-why-vaccine-supplies-have-been-slow/
[3] https://www.scmp.com/week-asia/health-environment/article/3141849/malaysias-empty-syringe-incidents-may-fuel-covid-19
[4] https://www.who.int/news-room/feature-stories/detail/vaccine-efficacy-effectiveness-and-protection
[5] https://kpkesihatan.com/2021/07/22/kenyataan-akhbar-kpk-22-julai-2021-situasi-semasa-jangkitan-penyakit-coronavirus-2019-covid-19-di-malaysia/
Chart above shows the daily doses administered with a 14-day moving average highlighted.
Observations:
The vaccination rate has increased steadily and it is hoped will stay at these levels until a significant majority of the population has been vaccinated.
Chart above indicates the number of doses that has been administered against the number of registered eligible receipients. This allows a more holistic view by tracking the moving target of registered receipients as well as comparing the remaining doses to be delivered. As a comparison, the population of adults above 18 years old is 23,409,600 as indicated by the horizontal line.
Observations:
Note that the data does not clearly indicate whether the registered receipients are entirely citizens or if it includes foreign nationals. The success of the vaccination program will depend on as many of the population being vaccinated irrespective of their citizenship or immigration status including undocumented or illegal immigrants.
Chart above is similar to the previous cumulative chart with the use of percentage to the total population. The intention here is to present registered and received vaccinations as percentages of the entire population, including those that are not eligible for vaccinations. Total doses administered has also been removed as it provides no bearing to this view of vaccine coverage.
The population of Malaysia above 18 years old is 23,409,600 which represents 71.68% to the entire population of 32,657,400.
Observations:
The population data was provided as part of the dataset and is stated to have been sourced from DOSM (Department of Statistics Malaysia). The number does not indicate whether it comprises of only citizens or includes foreign nationals. For now it is assumed that the data only represents citizens and permanent residents.
The chart above repesents the influence of vaccination on deaths. An important dimension included here is the cumulative cases since deaths are also influenced by the case numbers as they are proportionate (more cases ~ more deaths). Another dimension in this chart is the rate of change, the distance between circles represent the daily increase for the respective axis e.g. dose 1 is more spread out along the x-axis representing Cumulative Doses indicating a surge in vaccinations delivered per day, exemplified between 8 million and 10 million doses. Similarly, if the circles are spread out in the vertical axis of Cumulative Cases, then it would represent a surge in daily cases.
The aim of this chart is to present a visual relationship of how vaccination influences deaths with the cases visible since it is related. As the vaccination continues, it is expected that deaths will decrease (circle size will decrease). Both dose 1 and dose 2 data is presented here to determine significance between the doses. From the medical explanation, two doses provides the best likelihood of avoiding deaths, therefore it is expected that the 2nd dose will show less deaths as it moves towards the right. In the ideal scenario, the cases will rise slowly in the vertical axis while moving to the right quickly.
Note the sequence of circle size are the same as they both represent the deaths for a given day. Cumulatives (totals) are used as observations are not matched (death are not directly linked to the cases for the day).
Similar to the previous chart, the chart above attempts to describe the relationship between cases, vaccinations and deaths. Again, cumulatives (totals) are used as each observation is not directly linked.
The app was launched on 20th April 2020 as part of an effort to simplify contact tracing. Its use has also been expanded for vaccine registration. The published data for check-ins starts from 1st December 2020 until present.
The MySejahtera data published is an aggregate at national or state level for daily intervals as to protect the privacy of users. However, with the granularity of data available to the authorities, it is possible to develop a graph network of infected cases by collating with the recorded cases. From there it would be possible to use machine learning to model the likelihood of infection for each individual based on the location and duration in any premises (this again reiterates the highly sensitive nature of the data being collected). Beyond that, another model could be developed for specific locations given that an outbreak in a location is likely to repeat as the virus is passed on, especially now given that there is an increase in asymptomatic cases. Another potential data source could be an estimate of when the person was infected based on the viral load upon confirmation of a positive test. This would help to determine index cases and execute contact tracing more effectively. Furthermore, an analysis on the role of the environment as a factor in the infection rate may also be possible, e.g. a comparison of the numbers infected between office spaces versus wide shopping malls.
Some potential sources of error include:
In March 2020, Covid Watch, a group of researchers, published a whitepaper detailing an automatic decentralized contact tracing mobile app that protects user anonymity. This has since received funding and has launched in the US. The app communicates via Bluetooth to nearby phones and exchanges random numbers (possibly hashes) as contact events. If a person is found positive, a code will be provided to the user to allow these random numbers to be uploaded to a server where other phones will check if any of their own random numbers matches. The app will notify the owner if they are a close contact. All of these data can be made public without any sensitive information being discovered as none was provided, not even the location.
Observations:
As a note, the location QR code MySejahtera may not be available at all premises as there are possibly some locations, particularly outside of the big cities where they rely mostly on manual log logbooks. Also, some events may require their own QR code, e.g. weddings.
Observations:
It is possible to assume that downloading and using the app indicates compliance, it also means that the same users are more likely to comply with the SOP and rules of MCO. Conversely, the users that do not download the app (which represent 2/3rd of the population) are not represented in this data and their compliance is unknown.
Relationship between daily check-ins and daily cases
Observations:
Based on intuition, there is some expectation that the check-ins should rise and fall in an inverse relationshiop to the case numbers. However, it does not always seem to hold true. As seen in the previous chart, the inverse relation is during MCO2 while MCO3 was relatively flat.
Relationship between daily check-ins and daily cases split according to event
Observations:
The chart above describes the average number of check-ins per person. It is derived from dividing the total check-in by the active unique ID in a day.
Observations:
Some interesting points to note are the prevalence of these QR codes, earlier on these were mainly for shops or office premises, on a per entry basis. For shopping malls and most buildings, a QR code would also be placed at each entry of the building. In recent times, to enter a shopping mall may require 2-3 check-ins, once to enter the parking and once to enter the mall itself (another one if you are required to exit from the parking building to the mall building). Obviously this will affect the average value.
The chart above describes the average number of check-ins per active unique location. It is derived from dividing the total check-in by the active unique locations in a day. While the dataset specifies that the unique location indicates premises, conservatively it is possible to assume each QR code represents a specific location instead of an entire premise. This will affect the understanding of the data since if a person is going to several shops within a premise (e.g. supermarket + restaurant + pharmacy), ideally these are considered as a single location yet this quite possibly is not the case.
Observations:
Data is also available for total check-ins at every 30 mins intervals.
This chart shows the check-ins at every 30 mins interval. Zooming in will provide a better view of the daily check-in patterns.
The chart above shows the check-ins for every 30 mins layered on day by day. The colors are from light to dark indicating progression of days as indicated by the legend (e.g. light green is earlier and dark green is later).
Observations:
This view is a sideway view of the previous chart. Data has been trimmed to on the hour data to improve visibility.
Observations:
The chart above is a 3D view of the check-ins which combines the two previous charts of changes over time and day on day. It better reflects the check-ins at different times of the day and how it progressively changes over time.
## R version 3.6.3 (2020-02-29)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.6 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
##
## locale:
## [1] LC_CTYPE=en_SG.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_SG.UTF-8 LC_COLLATE=en_SG.UTF-8
## [5] LC_MONETARY=en_SG.UTF-8 LC_MESSAGES=en_SG.UTF-8
## [7] LC_PAPER=en_SG.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_SG.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] pals_1.7 lubridate_1.7.10 RcppRoll_0.3.0 stringr_1.4.0
## [5] RColorBrewer_1.1-2 plotly_4.9.4.1 ggplot2_3.3.5 tidyr_1.1.3
## [9] dplyr_1.0.7 pacman_0.5.1
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.7 pillar_1.6.2 compiler_3.6.3 jquerylib_0.1.4
## [5] tools_3.6.3 digest_0.6.28 viridisLite_0.4.0 jsonlite_1.7.2
## [9] evaluate_0.14 lifecycle_1.0.0 tibble_3.1.3 gtable_0.3.0
## [13] pkgconfig_2.0.3 rlang_0.4.12 mapproj_1.2.7 crosstalk_1.1.1
## [17] yaml_2.2.1 xfun_0.27 fastmap_1.1.0 httr_1.4.2
## [21] withr_2.4.2 knitr_1.36 maps_3.3.0 generics_0.1.0
## [25] vctrs_0.3.8 htmlwidgets_1.5.3 grid_3.6.3 tidyselect_1.1.1
## [29] glue_1.4.2 data.table_1.14.0 R6_2.5.1 fansi_0.5.0
## [33] rmarkdown_2.11 farver_2.1.0 purrr_0.3.4 magrittr_2.0.1
## [37] scales_1.1.1 ellipsis_0.3.2 htmltools_0.5.2 dichromat_2.0-0
## [41] colorspace_2.0-2 utf8_1.2.2 stringi_1.7.5 lazyeval_0.2.2
## [45] munsell_0.5.0 crayon_1.4.1